Inducing Discourse Connectives from Parallel Texts
نویسندگان
چکیده
Discourse connectives (e.g. however, because) are terms that explicitly express discourse relations in a coherent text. While a list of discourse connectives is useful for both theoretical and empirical research on discourse relations, few languages currently possess such a resource. In this article, we propose a new method that exploits parallel corpora and collocation extraction techniques to automatically induce discourse connectives. Our approach is based on identifying candidates and ranking them using Log-Likelihood Ratio. Then, it relies on several filters to filter the list of candidates, namely: Word-Alignment, POS patterns, and Syntax. Our experiment to induce French discourse connectives from an English-French parallel text shows that Syntactic filter achieves a much higher MAP value (0.39) than the other filters, when compared with LEXCONN resource.
منابع مشابه
Translating Implicit Discourse Connectives Based on Cross-lingual Annotation and Alignment
Implicit discourse connectives and relations are distributed more widely in Chinese texts, when translating into English, such connectives are usually translated explicitly. Towards ChineseEnglish MT, in this paper we describe cross-lingual annotation and alignment of discourse connectives in a parallel corpus, describing related surveys and findings. We then conduct some evaluation experiments...
متن کاملHow Comparable are Parallel Corpora? Measuring the Distribution of General Vocabulary and Connectives
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general meas...
متن کاملThe Penn Discourse TreeBank as a Resource for Natural Language Generation
While many advances have been made in Natural Language Generation (NLG), the scope of the field has been somewhat restricted because of the lack of annotated corpora from which properties of texts can be automatically acquired and applied towards the development of generation systems. In this paper, we describe how the Penn Discourse TreeBank (PDTB) can serve as a valuable large scale annotated...
متن کاملDiscourse-level features for statistical machine translation
The talk will show how the disambiguation of discourse connectives can improve their automatic translation. Connectives are a class of frequent functional lexical items that play an important role in text readability and coherence. Longer-range context is taken into account to learn the signaled rhetorical relations. The labels obtained from a discourse connective classifier are then integrated...
متن کاملAutomatic Disambiguation of French Discourse Connectives
Discourse connectives (e.g. however, because) are terms that can explicitly convey a discourse relation within a text. While discourse connectives have been shown to be an effective clue to automatically identify discourse relations, they are not always used to convey such relations, thus they should first be disambiguated between discourse-usage and non-discourse-usage. In this paper, we inves...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014